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ABSTRACT 

oo„4.^« 4. ..u « the five years of activities the documentation 

centre at the Royal Institute of Technology has established itself as 
an information centre in the fields of science and technology. The 
SDI service is now well implemented and its activities are used and 
appreciated by scientists, research workers and engineers at the 
universities, research institutions and in the industrial 
communities. Techniques for on-line SDI-query formulation and query 
alternation adaptive to user feedback are under development. The 
on-xine connection to the NASA:s Recon system in Darmstadt enables us 
to make retrospective searches in interactive mode. Research is going 

t^pJto^" "L"? Swedish network for Library Information system 
- LIBRIS - with international data banks with the objective to 
achieve a comprehensive information retrieval system for the whole 
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1. IflTRODUCTION 

The Swedish government has taken an active interest in developing a 
policy for economic growth. In 1967 it launched a program for the 
promotion of technological development and industrial growth, in 
which a plan for the development of scientific and technical infor- 
mation was included. The government was especially interested in 
studying the viability of mechanized information services in the 
field of science and technology, and the utility they could offer 
to users in research and industry. The Royal Institute of Technology 
library was chosen as the responsible agent for the establishment 
of a mechanized service for users in science, industry and education. 

The requirements for the computer operation of a* service had been 
thoroughly studied during Tell's years as department manager of the 
Swedish nuclear establishment, AB Atomenergi. Then, 'in 1967 the 
Institute library received the first grant of Sw.Cr. 80,000($16,000) 
to initiate a computerized service in the field of mechanical engi- 
neering. During the years the scope has extended and the grant has 
increased, and it has now stabilized around 1 Million Sw.Cr. outside 
the ordinary budget of the library. Half of that sum goes to the 
salaries for documental ists who have been added to the library staff. 
Thus, the fundamental requirements for staff and funds' have, been 
fulfilled by the new policy. 

2. THE BASIC TASKS OF AN -INFORMATION RETRIEVAL SERVICE 

A computerized information system has to perform a number of basic 
functions, such as 

- Entering various types of data 

- Formatting, abbreviating and coding of data 

- Processing information, i.e. searching, matching, sorting etc. 

- Producing standardized or specialized types of output, e.g. 
bibliographies, indexes, SDI etc. 

- Answering specific, one-time requests, i.e. retrospective searches 

- Reacting to various errors 

- Relating to other information systems 
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3. THE ORGANIZATION OF A NEW COMPUTERIZED SERVICE 
In order to start a computerized service the best choice, at least 
at that time, seemed to be a current awareness service- SDI - Selective 
Dissemination of Information. SDI is a system developed by late Hans 
Peter Luhn at IBM in 1959 for. alerting participants about new publi- 
cations such as journal articles, reports, conference papers et6. The 
acronym SDI has the special connotation that the process makes use of 
a computer. This is possible when the references to the literature are 
stored on machine-readable media. 

The system should be s6 designed that the selection and announcement 
of current documents should have a high probability of interest to 
the individual user. For this purpose the user must submit and routi- 
nely modify his "interest profile" which serves as basis for che 
computer matching of stored profiles against titles 'of indexing terms 
in the references. ' 

In order to keep the interest alive on the part of the participants, 
the SDI service must be prepared to offer a comprehensive coverage 
of the literature, and a backup of pertinent material. One of the 
major tasks in the expansion of the library service during the past 
five years has been to answer the incoming queries, resulting in 
profiles, as broadly as possible, and install new bibliographic data 
bases in case they could contribute to the broadening of the subject 
coverage. , 

By using a general information retrieval system (Telll). it has been 
possible to include additional files in the service, so that the 
search procedure and output routines can be the' same. By a "general" 
system we mean that it can make use of all the keys, tools and tech- 
niques for selecting references in response to a search request, e.g. 
classification schemes, keywords, words in titles or abstracts, author 
or author affiliation names, citations etc.. all of which can be used 
in traditional, manual searches. 



4. SOURCES FOR TECHNICAL INFORMATIQM 



SDI-systen !_aJJ^e_Roya]_ Inst1tiit3 of Toch nolr.n .-^ s i.^. ,,q ^ 
Databases. 1972. 



^ • Science Cita tion Index Source Data Tana f roni ths Institute for 

Scientific Infor.Tiatton (USk). containing interdicipl inary 
information fro:a the most frequently cited journals in science 
and technology, stores about 400 000 references a year. 

^' Mechanical engineering from the Royal Institute of Technoloov . 

^^Q^'^^Q^"' covers the literature in mechanical engineering and' ' 
metallurgy and stores about/ 40 000 refer*ences a year. 

^' Chemical Abs tracts Condensates from ChemicaT Abstracts Service 

(USA) stores about 340 000 references a year to literature in 
the field of chemistry. 

^"^P^'^ I nformation Service in Phvsics, Electrotechnology and ComPL-ters ' 
Control from the Institution of Electrical Engineers (U.K.) 
■ in collaboration with the Institute of Electrical and ElPrtrnmVc 
■ .Engineers (USA). This is the most comprehensive in/ormation 
system within the fields given In the title and it 'stores about 
120 000 references a year. 

5. Metadex Metals Abstract s Index Tapes from the American Society for Metals 
in collaboration with the Institute of Metals (U.K.) «;fnrpc .hn..f 
• 24 000 references a year to literature in the field of metallurgy. 

^* G overnment Reports Announcements from the National Technical Info^- 

mation Service (NTIS). USA. This information system stores about ~ 
40 000 references a year to reports on USA federal sponsored re- 
search in the fields of science and technology. 

# 

7. COMPEfiDEX Computerized Engineering Index from Eng ineering Index Inc . (USA) 
covers the literature in engineering and technology and stores 
about 72 000 references a year. 



8. f;SA 



Nuclear ?cien>:e Abstracts fro.-n t?ig United $ t^n- 'S Ate-:! ic ^mvgy 
Conimissjcn stores about 50 000 »'GferencG5 a year. Literatur-? 
searching on tfiG NSA database is carried out in close collafcoration 
with AB Atomenergi. 



9. ABIPC 



10. WOOD 



11. FSTA 



Abstract Bulletin of the In stitute of Paper Chem istry from the 
Institute of Paper Chemistry (USA) stores about 10 000 references 
a year to recently published articles, patents, and theses in. the 
field of pulp and paper chemistry and technology. 

^000 from the Swedish Forest Products Research Laboratory and the 
Royal Insti tute of Technology Library, Stockholm stores about 
15 000 references a year in the field of wood technology. 

Food Science and Technology Abstracts from the International 
Food Information Service (Germany) covers the literature in 
food science and chemistry and stores about 12 000 references 
a year. 



12. ERIC 



ERIC Master Files from the Educational Resources Informatio n 
Center (USA) stores about 30 000 references a y^ar to reports 
and articles, and other publications in pedagogics and modern 
educational science. > 



13. NYFLI 



14. STAR 



Accession List from the Royal Institute of Technology Library , 
Stockholm annually stores about 7000 titles to literature aquired 
.by the libraries of AB Atomenergi, Chalmer's Institute of 
Technology, and the Royal Institute of Technology. 

Scientific and Technical Aerospace Reports from National 
Aeronautics and Space Administration (USA) stores about 45 000 
references a year to reports from all fields connected v/ith 
aeronautics and space technology. 



15. lAA 



International Aerospace Abstracts from the American Institute of 
Aeronautics and Astronautics (USA) stores about 50 000 references 
a year to journals, meetings, patents, and other literature in 
the same field as STAR. 



Databases 14 and 15 are searched at the ESRO documentation centre. 



5. IMPLEM EMTATION OF THE DATA BASES INTO ABACUS I VIRA 

The basic approach employed has been to use a general processing forr^at 
into which a record of a particular output of different files can be 
converted by a reformatting program so that its records can be searched 
The success of this pragmatic approach to the compatibility problem of 
various tape formats greatly depends upon the hospitality of the search 
record format. The ABACUS was designed in-1966. before the MARC pilot ' 
program and the interchange format reflected in. International Standard 
ISO/DIS 2709 which is foreseen as the standard for UNISIST. However, 
the ABACUS record has many, characteristics in common with MARC and ISO. 
A directory to the whole record maps out the record length, the data 
elements present, and the number of characters in each element. The 
directory is a fixed field header followed by variable data fields 
The fixed fields give the address to, and the lengtj, of the variable ' 
fields. The items of interest in the external data base are selected 
and fields in the ABACUS format are allocated by the reformatting program. 
Depending on the amount of infcrmation on the external tape, the 
identification process differs from one format to another. 
Among the more extensive format in the databases are ERIC Report 
Resume Master Data Set and Government Reports Announcements many of 
which fields are not applicable in the shorter format of databases 
containing references to journal articles. Not all fields Vn the different 
databases are of interest to the users. Thus, at present, some fields 
are deleted when reformatting into the ABACUS. Table 1-2 shows the 
ERIC Report Resume Master Data Set Fields and the International Food 
Information Service - IFIS - and their treatment in the ABACUS record 
Even If documentation is provided by a data base producer, the reformatting 
specification is written after inspection of tape dumps. 
In general, the reformatting of the different tape formats is rather 
straightforward work of 30 hrs programming, even if they deviate- from 
the ISO interchange format into which, it is hoped, they will eventually 
change. Essentially, the allocation of fields in the ABACUS program depends 
on the fields identifica :ion numbers within the record types for reports 
and articles. As can be seen from Table 1-2 the 25 fields in the ERIC 
format yield 5 fields and the 17 fields in the IFIS yields 8 fields in 
the ABACUS set of searchable fields. The search terms can operate within 
these, since they are specified with regard to the type of field in which 
they are to be searched. 
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The Reformatt ing of ERIC Report Resume Master Data Set FipIHc 
into the ABACUS Format ' 
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Field name 



Field 'identi- 
fication no. 
in hexadecimal 



Searchable Printout Deleti 



Sequence 
Add Date 
Change Date 
Accession Number 
Clearinghouse 
Accession Number 
*Other Accession No. 
xProgram Area 
»Publication Date 
Title 

Personal Author 
*Institution Code 
^Sponsoring Agency 
Code 

Descriptor 
Identifier 
»EDRS Price 
xDescriptive Note 
Issue 
Abstract 
*Report Number 
*Contract Number 
*Grant Number 
xBureau Number 
«Avai lability 
Journal Citation 
^Institution Name 
Sponsoring Agency 
flame 



0000 
0001 
0002 
0010 

0011 
0012 
0014 
0017 
001 A 
001 B 
001 C 

0020 

0023 

0024 

0025 

0026 

0028 

002C 

002D 

002E 

002F . 

0030 

0031 

0032 

0080 

0084 



X 
X 



X 
X 
X 



X 
X 



on 



X 
X 
X 



X 
X 



X 
X 
X 



X 
X 
X 

X 



I -i- 



* Not Used in CUE 
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The reformatting of IFI S data set fields into the ABACUS format 

ABACUS 

Field name Field identifi- Searchable Printout Deleti 
. . , cation no. 

Year, vol., no., category, 010 x 

running no. of printed " 

abstract 

Authors 030 
Author annotation 035 

Year 036 

Title in English 040 

Original title if not 041 
in English 

Title annotation 042 

Journal name, patent country 050 

Vol., issue, page, patent no. 055 

Number of cited references 056 

Language 057 

Affiliation 058 

Abstract 080 

Initial of abstractor 081 

Heading ^qI 



on 



X 
X 



X 

X X 
X 
X 

X X* 

X 



Heading 



X 
X 
X 



X 
X 



-o- 



6. PROFILE CHARACTERISTICS 

The construction and revision of query profiles is an essential task 
in an SDI system which demands an effort both from the user and the 
subject specialist. When a user wants to- submit a question to the 
SDI system he is requested to formulate his field of interest in 
natural language, which means in a normal^ narrative way, describing 
his interest in some detail. It has proved very useful for the user 
also to supply some references to papers which he considers relevant 
to his query. He could also provide a list of significant terms and. 
If possible, make a draftof the actual search profile. The staff has 
prepared a Profile Design Manual which explains the principles of a 
computer-operated information retrieval system and describes all 
details of the profile construction. 

The interaction between the staff and the user is essential for d 
successful search. On the basis of the user's statements the subject 
specialist specifies the question by making a list of significant terms, 
which might occur as potential words in the titles of documents. Among 
the staff there are subject specialists in education, psychology, business ' 
administration, electrical & mechanical engineering, chemistry, physics, etc. 
Furthermore, the list might also include authors, affiliations, and 
journal titles. As the system permits search both on keyv/ords and on 
natural language used in titles, the subject specialist uses thesauri, 
handbooks, dictionaries, and all other means he might, find helpful and 
relevant for the formulation of the profile. He has to make a special ' 
point of checking the printed volumes of the corresponding databases to 
find the occurence of terms when used alone or in combination with other 
terms. A generalized flow chart. Fig. 1. has been constructed by 
Zofia Gluchowicz (2). 

While the keywords must be written exa-tly as they appear in the Thesaurus 
and on the tape, the free text terms in potential titles can be truncated 
both at the beginning and at the end. Truncation facilitates retrieval of 
items containing word fragments which are common to different forms of a 
word, and words within words can be searched for. As will be seen from 
examples below, suffix (right-hand) truncation occurs very often, while 
prefix (left-hand) truncation is more unusual. Both suffix and prefix 
truncation is. on the other hand, more conimon. For example, the truncated 
term /CASSETT/. where the slashes stand for truncations, will retr^'eve 
STEREOCASSETTES. VIOEOCASSETTE. CASSETTE-RECORDER. CASSETTE/CARTRIDGE, etc 
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GENERALIZED FLOW CHART FOB PROFILE CONSTRUCTION 
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The SDl subscriber provides narrative 
description of requested information 



The documentalist checks his interpreta- 
tion of the subject 



4- 



Statement of specific search words, syno- 
nyms, relatedsubjcct phrases, ^narrower 
terms, broader terms, author names, jour- 
nal titles, author affiliations 



4- 



Truncation of terms 



Construction of search logic, which deft* 
nes subject by combining groups' of 
search terms using logical operators 



K Search terms and search logic coded for 
input 



Search terms and search logic punched and 
read into computer 



No 




Subscriber evaluates profile and refe- 
rences 




Z. Ctluchowicz 



- Yes 



References relevant to the sub- 
ject provided by the customer 



Thesauri, glossaries, handbooks, 
indexes and titles in abstract 
journals 



Manual for profile construction 



Manual for profile input codes 



Profile stored in magnetic tape 
profile store 



ISl Source Tape 
MechEn 

Chemical Abstracts Condensates 
INSPEC 

Metals Abstracts Index 

Governments Reports Announcements 

COMPENDEX 

Nuclear Science Abstracts 

Abstract Bulletin of the Institute 

of Paper Chemistry 

WOOD 

Food Science and Technology Abstracts 
ERIC Master Files 

Scientific and Technical Aerospace 
Reports 

International Aerospace Abstracts 



As can be seen from Fig. 2-3 the terms are numbered sequentially in 
the profile printout to facilitate updating. The terms are also grouped 
together, and the groups are indicated by capital letters A,B,C etc. 
Terms, or groups of terms, are linked together in a logical manner by 
using "and", "or", and "not'.' logic. The number of terms in one profile 
might be up to the system-allowed 150 in ABACUS. In the new VIRA program 
there are no such restrictions. On the other hand, as charging policy 
is to count 30 terms as one profile, the average number of terms per 
profile varies around 24. 

The printout of the profile also includes a description in natural 
language of the query, the search logic, and the list of terms classified 
according to type of terms such as words, keywords.' author names etc. 
The profile printout and every updating of it is sent to the user. For 
verification a copy of the profile as well as a copy of the search results 
are kept in the files of the service, transferred every 9 months into 
microfilm cassettes. 

The user's responses to early selections based on the first profile 
approximation to his field of interest are used for improving the profile. 
Thus, the maintenance of the profile is carried out by adding new terms. . 
and subtracting old ones which do not give satisfactory results, or by ' 
opening and tightening the logic. False co-ordinations between search 
terms from different term groups can also be detected .nd should be avoided. 
While constructing the initial profile we try to choose the logical 
strategy considering the user's wishes, and accordingly decide on the 
degree of restrictivity for the initial computer run. Often we use a 
less restrictive logic, i.e. not too many "and" or "not" restrictions, 
in the initial profile, even if it will result in an output of many 
irrelevant references, i.e. noise, and then, after a few searches adjust 
the profile on the basis of the user's evaluation of the output. 



Profile 70E 

Subject: Audiovisual aids for the mentally retarded. 
Data bases: ERIC, ISI. INSPEC. 
Logic: A & B 



Term 


Term. 






Terin 


No. 


Group 


Search terms 


Wei aht 


TVDP 


010 


A 


TAPE RECORD/ 


2 




020 


A 


VIDEO TAPE RECORD/ 


2 




030 


A 


EDUCATIONAL TELEVISION/ 


2 




040 


A 


INSTRUCTIONAL TELEVISION/ 


2 




050 


A 


AUDIOVISUAL/ 


10 




060 


A 


CASSETT/ 


• 2 


WORD 


070 


A 


CARTRIDGEV ' 


2 




080 


A 


EVR 


2 

Cm 




090 


A 


VTR 


2 


WORD 


100 


A 


VCR- 


2 


WORD 


no 


A 


ETV 


2 


WORD 


120 


A 


I TV 


2 


WORD 


130 


A 


CTV 


^ 2 


WORD 


140 


A 


SELECTAVISION/ 


2 


WORD 


150 


h 
i\ 


TELEVISION 


2 


WORD 

T%\J\\\J 


160 


A 


TV 


2 


WORD 


'170 


A 


/VIDEO/ 


2 


WORD 


180 


A 


CARTRIVISION/ 


2 


WORD 


190 


A 


8MM/ 


2 


WORD 


200 


A 


AUDIOVISUAL/ 


10 


. WORD 


210 


A 


AV 


10 


WORD 


220 


A 


A-V 


10 


WORD 


230 


A 


VIDICORD/ 


10 


WORD 


240 


A 


VISUAL AID/ 


10 


WORD 


250 


A 


MEDIA/ 


2 *• 


WORD 


260 


A 


PICTURE/ 


2 


WORD 


270 


A 


LONG-DISTANC/ 


, 10 


WORD 


280 


A 


AUDIO- VISUAL 


10 


WORD 


290 


B 


EDUCATIONALLY DISADVANTAG/ 


2 


KEYWORD 


300 


B 


LOW ABILIT/ 


2 


KEYWORD 


310 


B 


SLOW LEARNER/ 


2 


KEYWORD 


320 


B 


MENTALLY HANDICAP/ 


10 


KEYWORD 


330 


B 


EDUCABLE MENTALLY HANDICA/ 


10 


KEYWORD 

|\L. 1 llUlxU 


340 


B 


RETARDED/ 


10 


KEYWORD 

l\L. 1 iJUixU 


350 


B 


RETARDATION/ 


10 


KEYWORD 


360 


B 


MENTAL RETARDATION/ ' 


10 


KEYWORD 


370 


B 


EXCEPTIONAL/ 


2 


KEYWORD 


380 


B 


SPECIAL/ 


2 


KEYWORD 


390 


B 


RETARD/ 


10 


WORD 


400 


B 


LOW/ 


2 


WORD 


410 


B 


SLOW/ 


2 


WORD 


420 


B 


FAILUR/ 


2 


WORD 


430 


B 


DISADVANTAG/ 


2 


WORD 


440 


B 


HANDICAP/ 


2 


WORD 


450 


B 


BELOW/ 


10 


WORD 


460 


B 


EXCEPTION/ 


2 


WORD 


470 


B 


DROPOUT/ 


2 


WORD 



In total 45 searchwords, of which 14 are keywords from the ERIC Thesau.'us. 



.Profile 26U 

Subject: Electronic circuits and systems 
Data bases: INSPEC 

Logic: A +Bx(C +D +E) +C»(E +F +G +H +K +L +M) + 
L*M +E)t(K +M) +G«N +P»(H +R +B*K) -S 



1 1^ A Mam 

1 erin I erm 
No. Group 


Search terms 


Weight 


Term 
■ Type 


0008 A 
nnoQ A 

UUU^ M 

0010 A 

uoh a 

0012 A 

0013 A 

0014 A 


« AUTOMATA" THEORY* 

* COMPUTER DESIGN* 
» DIGITAL SYSTEM* 

* LOGIC SYSTEM* 
■» MACHINE LOGIC* 

« SEQUENTIAL MACHINE* " 
X SYNCHRONOUS SYSTEM* 


2 
2 
2 

6 

* 2 
6 

2 


WORD ■ 

WORD 

WORD 

WORD 

WORD 

WORD ■ • • 
WORD 


0015 3 


« NETl'/ORK* 


6 


WORD 


0016 C 


* LOGIC* 


6 


WORD 


nm? n 

UU 1 / u 


* DIGITAL* 


2 


WORD 


0018 E 


* SEQUENTIAL* 


2 


WORD 


nni ft r" 

0019 F 

0020 F 
uufc i r 
0022 F 


X ALGORITHM* 

* AUTOMAT* 

* COMB I NAT* 
» PARTITION* 


2 
2 
2 
■ 2 


WORD 
WORD 
HQRD 
WORD 


UU^o u 


X FUNCTION* 


2 . 


WORD 


0024 H 


* SIMULAT* 


2 


WORD 


0025 K 


« SYNTHESIS* 


2 


WORD 


\J\JC»\} L 


* LANGUAGE* 


2 


WORD 


0027 M 


X DESIGN* 


2 


WORD 


0028 N 

0029 N 


X MULTIPLE OUTPUT* 
X MULTI-VALUEx 


2 
2 


WORD 
WORD . 


0030 P 

0031 P 

0032 P 

0033 P 

0034 P 


* B03* 

X B046X • 

* C90* 
X C92x 
X C93X 


2 
2 
2 
2 

2- 


CLASSIFICATION rnnF 
CLASSIFICATION CODE 
CLASSIFICATION CODE 
CLASSIFICATION CODE 
CLASSIFICATION CODE 


0035 R 

0036 R 

0037 R 

0038 R 

0039 R 


X NAND * 
X NOR X 
X FLIPFLOP* 
X FLIP FLOP* 
X MINIMI* 


2 
2 
2 
2 
2 


WORD 
WORD 
WORD 
WORD 
WORD 


0040 S 


X FILTER* 


98 


WORD 



7. PROCESSING METHODS AND COSTS 

An inevitable characteristic of large retrieval systems is, that a 
strategy for searching a small or medium size data base might differ 
significantly from a search strategy for a large base. During the five 
years our search methods have 'passed through the mere masking-off 
technique, yielding search times proportional to the number of refe- 
rences and terms in the profiles, into a more elaborate technique 
making Use of hashcoding and tree structure searches, thus arriving 
at an almost logarithmic increase in time when the number of terms, 
in the profile grow. The newest program, having the acronym VIRA 
and written by Rolf Larsson, is run in parallel with ABACUS (Zennaki 3) 
The present profile program, PROSA, includes 2,500 statements in 
COBOL, and the VIRA search program counts 2,000 statements in IBM 
assembler language. 

In order to carry out a rough check of the performance of the profiles 
on a "management by exception" basis, two statistical tools have been 
developed. The critical values of the printout to a user are (1) an 
abundance of references, and (2) no printout. In order to reveal these 
extremes, every search results in search statistics indicating the 
number of references for each profile. The form is uesiyn^d like the 
scale of the speedometer of many cars, the longer thn row of "stars" 
the more the reason to put ones foot on the brake. Fig. 4 displays 
part of the search statistics for a Vun on ERIC. The columns give 
the number of references to the first digit, the second, etc. Thus,the 
first profile has resulted in 6+40 = 46 references, the second in 
8+60+300 = 368 references. On the other hand, profile No 26R has given 
no output. Furthermore, at the bottom on the form an indication is 
given of which profiles have received no hits, and those which have 
received more than 40 hits. 

These search statistics give an indication of where the exceptional 
cases are located among the profiles. The next step is to analyse what 
causes thD no-hits or the great number of hits. In order to find out 
about the latter case, a listing is also given for every profile stating 
which terms or term combinations have caused the printout including 
the frequencies of these terms. See Fig. 5 in which case the first step 
would be to an?ly3e the combination t'lEASUREMENT TECHNIQUES and MEASUREMENT 
INSTRUMENTS which occurs 13 tjmes, perhaps in order to change the logic 
or to place these words in separate groups, if they have given rise to 
many irrelevant references. The second column in Fig. 5 indicates the 
weights we are experimenting with which will be discussed later on. 



8 SEARCHING KEYWORDS AND WORDS IN TITLES 



The ABACUS program is designed in such a way that it can process 
natural language by searching titles and/or abstracts. In the case 
of another data base, Science Citation Index Source Tapes, the ISI 
tapes, which covers 2,000 journals there are no keywords or other 
subject indicators than the titles. Thus, .free text search is the 
only way to open the files. Free text search can be regarded as 
using a set of skeleton keys to open up any machine readable file. 
Some files make use of keywords chosen from a corresponding thesaurus 
of descriptors. Searching these keywords become an additional means 
for the subject specialist or the user to augment the search per- 
formance of the files containing keywords compared with the ISI 
tapes. When a data base contains keywords, we have recommended that 
they should be used in combination with words in natural language. 
In a multi-data base environment the same profile in natural language 
can easily be used on various data bases, while the use of keywords is 
restricted to each specific data base which has to be taken into 
account when formulating the profile. Many of our profiles are searched 
on several databases since our main principle is to answer the query in 
its broadest sense disregarding from which data base the responding 
references will stem. 

Especially for questions of inter-disciplinary nature it is obvious 
that they should be processed on several data bases' in order to assure 
good coverage. It is true, however, that the reformulation of a query 
into a profile for the SDI system takes place in a kind of dialogue 
with the computer, focusing on one data base at a time considering both 
the terminology used in free text, and the metalanguage of keyv/ords or 
other subject indicators. In order to arrive at a standardization of the 
query fonnulation, allowing for different degrees of complexity of 
natural text and metalanguages, a method has been developed for trans- 
lation between the various scientific disciplines reflected in the data 
bases by the generation of vocabularies and concordance for words in 
natural language and the various thesauri used. 

We have started work in this area by the compilation of word frequency 
lists for various data bases as ERIC, CAC, INIS. and ISI. 
That the use of the language (the scientific "jargon") is different in 
various disciplines has been displayed when compiling frequency lists 
for these disciplines. So, for instance, was the first significant word 
in the INIS system - nuclear energy - REACTOR, and the first in CAC - 
organic chemistry - ACID, in ERIC - EDUCATIONAL. The non-informative 
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Frequencies of coincidancas of profile 64G. 
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8 SEARCHING KEYWORDS AND WORDS IN TITLES 

The ABACUS program is designed in such a way that it can process 
natural language by searching titles and/or abstracts. In the case 
of another data base, Science Citation Index Source Tapes, the ISI 
tapes, which covers 2,000 journals there are no keywords or other 
subject indicators than the titles. Thus,. free text search is the 
only way to open the files. Free text search can be regarded as 
using a set of skeleton keys to open up any machine readable file. 
Some files make use of keywords chosen from a corresponding thesaurus 
of descriptors. Searching these keywords become an additional means 
for the subject specialist or the user to augment the search per- 
formance of the files containing keyvrords compared with the ISI 
tapes. When a data base contains keywords, we have recommended that 
tUy should be used in combination with words in na'tural language, 
r.i a multi-data base environment the same profile in natural language 
^^sily be used on various data bases, while the use of keywords is 
- restricted to each specific data base which has to be taken into 

account when formulating the profile. Many of our profiles are searched 
on several databases since our main principle is to answer the query in 
its broadest sense disregarding from which data base the responding 
•"eferences will stem, r 

Especially for questions of inter-disciplinary nature it is obvious 
: that they should be processed on several data bases- in order to assure 

good coverage. It is true, however, that the reformulation of a query 
into a profile for the SOr system takes place in a kind of dialogue ' 
; with the computer, focusing on one data base at a time considering both 

the terminology used in free text, and the metalanguage of keyv/ords or 
; other subject indicators. In order to arrive at a standardization of the 

query formulation, allowing for different degrees of complexity of 
natural text and metalanguages, a method has been developed for trans- 
lation between the various scientific disciplines reflected in the data 
bases by the generation of vocabularies and concordance for words in 
natural language and the various thesauri used. 

We have started work in this area by the compilation of word frequency 
lists for various data bases as ERIC, CAC, INIS, and ISI. 
That the use of the language (the scientific "jargon") is different in 
various disciplines has. been displayed when compiling frequency. 1 ists 
for these disciplines. So, for instance, was the first significant word 
in the IMIS system - nuclear energy - REACTOR, and the first in CAC - 
^ organic chemistry - ACID, in ERIC - EDUCATIONAL. The non-informative 
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words as FOR and TO occur in almost the same order in these data bases. 
The following remarks based upon our experience might illuminate the 
efficiency of descriptors in a thesaurus. The combined search strateqy 
we use, mixing keywords and words in free text, reveals that the present 
indexing habit in some data bases of using keywords identical to words 
in the titles is futile. If some of the keywords instead took the place 
of broad subject categories it would add a new dimension to the search. 
This is, for instance, the case with the data base INSPEC. 

A study should also be^made about the proportion of titles that are 
not useful as content indicators and, thus, not suitable for free text 
searching. If only a small amount of titles are meaninqless, a human 
indexing using thesaurus keywords should be ques*tionned. 
On the other hand, if something needs to be done, especially if we 
believe that keyword indexing is necessary for' the quality of printed 
indexes or for future on-line retrieval systems of the RECON type, tit:e 
augmentation of automated keyword assignement seem to be attractive 
alternatives to expensive human indexing. Such a strategy might cause 
authors to improve the information content of their titles. This has 
happened in areas where KWIC indexing technique is used. 
Because of the costs of indexing we could never afford it for our own 
data base in mechanical engineering, wood, paper and pulp industry, 
covering 250 journals (60,000 references/yr) in three languages. Only 
title augmentation is permitted in case of short titles (less than 
60 characters). We know that we can give satisfaction to the users by 
free text searching only, because at present, we receive orders for 
several hundreds photocopies a month as a result of the output. 



0. EVALUATION AND FEED-BACK 

At present 1100 users receive SOI service on our databases. After 
five years of operation on tapes in general we feel that we are 
still just scratching the surface of computerized information retrieval. 
We think, for instance, that the printout we now deliver as answers to 
the queries should go through further refinement before reaching the user. 
When we consider the construction of a profile as reflecting a specific- 
query, it is difficult to provide a measure of its effectiveness. especially 
as our practice is to retrieve references from multiple files. Questions 
about recall and precision lose interest. The essential measure which 
we can assess is the user's satisfaction which can be expressed on a • 
scale from highly relevant to irrelevant, or by^counting the number of 
documents he orders. 

Time and costs of the computer are other factors whtch can be measured, 
between computer costs plus the costs for the tapes and the subscription 
fee for the profiles, leaving other costs, e.q. the construction of the . 
profiles to be defined as common library costs. 

The delay time for the same reference appearing in the various services 
has been studied. We know that ISI is much faster than COMPENDEX or 
• INSPEC. and also than ERIC. However, delay time often does not have 
a significant effect on the user. It happens instead when he receives 
an early reference that he judges it as of low interest or 'irrelevant, 
while the same reference appearing 3-6 months later ,is evaluated as 
very interesting, and he orders a copy. In several cases, it seams as 
the continuous SDI service has a sort of learning effect on the user. 

10. METHODS TO ESTABLISH A HELPFUL OUTPUT ORDERING 

This paper is not intended as a primer on information retrieval 
but the reader might already have noticed in Fig. 2 .3 and 5 that 
there are indications of a weighting procedure (VIKT = WEIGHT). We 
should, therefore, like to mention that we are experimenting with 
various weighting methods in order to establish a helpful ordering 
of the output so that references early on the list should have higher 
probability of interest to the individual user than the later ones. 
The method shown in Fig. 2 and 3 is based upon the assumption that 
the words used in the profile and the words occuring in a reference 
are related in such a way that the more the words co-occur, the higher 
the probability that the reference is relevant to the query. 
This gives, us one way of ordering the output. Thus, we note the number 
of co-occurences and let the search logic operate arithmetically to 
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arrive, at the values upon v/hich we base the orders. As can be noted 
from the profile 70E in Fig. 2. the weight 2 in general is assianed 
to all terms. However, the user has regarded some terms of greater 
importance and assigned the weight 10 to them. The three words which 
pick up the first reference in the printout in Fig. 6 have all the 
weight Of 10. two of which are in the same term group, thus, 10 . 10. 
The ogical Boolean operation "and" is translated into multiplication, 
so the complete expression will be: ^0 x (10+10) = 200. as the weight 
Shows. To the four v/ords which pick up the first reference in the ^ 
printout in Fig. 7 th^ following weights have been attached in the 
profile, see Fig. 3. NETWORK-6. LOGIC-6. C 92-2. NAND-2. According 
to the search strategy of this profile the reference becomes the ' 
weight 6x5.2x2=40. In this case it seams to have worked to the user's 
satisfaction, since he has ordered a copy by circling the reference. 
. Usually we do not influence the user to put in subjectively assigned 
weights, as we should like to find out more about the objectively 
assigned weights. This brings us back to the list of word frequencies 
dealt witfi under Chap.7. We could order the references based upon 
tfe frequencies of the words in the data base which is our next step 
m preparation. The underlying reasoning is as follows. 
When forming the logical expression in a keyword based system arranged 
as an inverted file, it is common to base the logical expression upon 
t^e number of documents pinned to each keyword. This number indicates 
tfie frequency with which this keyword has been used for indexing. Thus; 
on-line searches on a display terminal usually end by.forming the logical 
expression that gives the minimum output. This means that high frequency 
terms are lookid upon as having less value than those with low frequencies 
In a free text search system in the batch processing mode, a search can 
be based also upon term frequencies using natural language if we build 
a frequency table from a large sample of references of each data base 
say around 30.000 references. The values for orderina could then be ' 
established as the sum of the values of the co-occurrinq terms, if those 
are expressed as 1/n. where n is the frequency of the term given by the 
frequency table (Tell 4). Such frequency tables are under construction 
for several data bases. 

The weighting procedure is only the first step. We are noinq to study 
parsing and computational linguistic methods in order to find out the 
contribution such methods can give to the output ordering. We hope to 
arrive at shorter lists by introducinq a cut-off when the weinhts are too 
low, thus saving computer and user time. 



IK PERSONNEL AND TRAINING 

Baing responsible for exploring the utility of computerized information 
servfces to scientific research, hiqher education and industry, we have 
felt that one task has been to carry out research and development of the 
kind which has been disclosed, above. The other tasks are production, 
management, clerical support, and supporting library service. The overall 
staff picture for running the SDI servicer's 12 full-time equivalents. 
The number of subject specialists are 8, clerical equivalents 4, and 
programmers 1. In the transitory state we are at present, operatlna with 
two systems, ABACUS and VIRA, the profile updating is laborious which 
Ras made it difficult, for example, to devote time to the construction 
of group profiles of Interest In several areas. »SDI Is tailor-made for 
tfie individual and requires personal attention of the subject specialist, 
and Becomes relative time-consuming, while group preflles are cheaper 
in updating without the necessity to adapt to individual requirements. 
Also tfie library back-up service has been put under pressure since the 
introduction of the SDI service. Even If requests for copies of the 
references put out of some files are shifted over to other libraries 
where some microfiche collections are located, .most references to journal 
articles and technical reports are handled by our library from Its 
collections or by inter-library loans. In many cases photqcopies are 
ordered from the National Lending Library in Boston Spa, U.K. This 
follow-up service .Is found to be Important In order, to keep the Interest 
of the users. 

The effectiveness of the search profile is, to a high degree, dependent 
on the active Interest of the subscriber. The user Is more able to 
influence the effectiveness of his search profile If he knows the basic 
principles of the computer-operated Information retrieval system and 
profile construction technique. Therefore we have organized one-, two- 
and ten-days educational seminars with lectures and exercises In profile 
construction, see Table 3-5. Research engineers, production engineers and 
draftsmen of different levels have participated in these seminars. All 
of them had encountered the Increasing need for up-to-date Information 
in their daily work. The participants were not only Informed about the 
principles of the SDI system, but were also given an Introduction to 
manual Information retrieval methods, see Table 5. This was done because 
the initial intellectual effort placed on the the user, when he has to 
define his problem, is the same for both methods of Information retrieval. 
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Seminar on the structure and use of scientific and technical 
literature for scientists, engineers, and technicians. 



Day 1. 



Programme 
Morning 



Afternoon 



Introduction to seminar. Tour of the library. 

Structure of scientific and technical 
literature. 

Guides to primary and secondary information 
sources. * 

The technique of literature search by 
conventional methodes. 

Practical work: 

Training in the use of scientific literature. 
Participants perform literature search on 
specially chosen items. 

Discussion of seminar. 



Day 2. 



Morning Special libraries, information centres, 
documentation services.' , 

• Computerized information retrieval: 
The SD I system at the Institute, profile 
performance and users' feedback. 

Afternoon . Practical work: 

Participants perform profiles on chosen items, 
Discussion of seminar. 



Zofia Gluchowicz 
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Seminar on the SDI system at the Royal Institute of Technology 
(Selective Dissemination of Information) 



Programme 
Day 1. Morning 



Afternoon 



Day 2. Morning 



Afternoon 



Introduction* to seminar. 

SDI from the user co-ordinator's viewpoint. 
Description of data bases, profile 
performance, feedback, evaluation, profile 
adjusting. 

SDI from the users^ viewpoint. 

SDI users relate their^ experience of the SDI 

service. 

Practical work: 

Training in profile performance on items 
chosen by the participants. 

SDI from the system designer's and the 
programmer's viewpoints. 

Practical work continued as above. 

Development trends and future prospects 
of computerized information retrieval. 

Discussion of seminar. 
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One day seminar on the SDI system at the Royal Institute of Technology 
(Selective Dissemination of Infomation) 



Programme 

Morning Introduction to seminar 

Presentation of tapes service and subject 
categories covered. 

Profile construction for SDI service, 
evaluation, feedback. * 

Afternoon Practical work: 

Participants perform individual search profiles 
.for searching on the different tapes. 

Discussion on seminar. 



About 70 engineers and scientists participated in the 



seimnars. 
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•The user will more easily associate the new technique with the 
traditional methods and he will be better aware of what the SDI 
service can offer regarding literature coverage and timeliness. 
In this way the interest for the SDI service has been intensified 
and the user takes more active part in the handling of the profiles. 
These seminars are much appreciated and they are given in different 
parts of Sweden. Lectures on and training *in profile construction 
have also been included in the curriculum for the fourth year for 
the students of the Institute. The courses have been given by the 
library staff. 

During the two-months course in information and documentation 
techniques for graduates in science and technology, 60 hours were 
reserved for lectures and training in computerized documentation 
and profile construction. ' 

Our experience from trying to market the data bases to scientists 
and people in industry has been that the most effective means is 
one-day seminars where afte-noon sessions is devoted to group work 
when every participant under the guidance of one of our staff constructs 
a profile in his field of interest, see Table 5. We promise then to 
run it on a trial basis free of charge for a few months. Such a 
procedure of "taking the service to the user", has appeared success- 
ful in attracting potential users. 

12. THE ON-LINE INTERACTIVE MODE 

We have now arrived to the stage when, as information centre,. we have 
started to use terminal equipment for on-line access to computer stored 
information in big information data banks. The salient component in 
this man-machine interactive system is the remote console. In our case 
it is a portable input/output terminal which generates and displays 
information on a standard television receiver, accepts information from 
a keyboard and communicates with the computer which recognizes our 
signals. The information oh the television screen can also be selectively 
transmitted to a classical teletype terminal at our end, or ordered to 
coma out on the line printer at the data bank centre. 
The documental ist as the intermediary between the inquirer and the stored 
information and/or the inquirer himself can start to negotiate through 
the terminal with the computer processing the search on the databank. 
At present wa have direct connection with ESRO:s (European Space Research 
Organisation) Computer Center in Darmstadt where about one million 
references are stored in following files: 



Files 



^^mber of From year 

references 



1962 



105 000 1969 
1969 
1969 



1. Scientific and Technical Aerospace 

2. International Aerospace Abstracts 
lAA 

3. Computerized Engineering Index - 
COMPENDEX 

4. Metals Abstracts Irtdex- Metadex 79 ooo 

5. Nuclear Science Abstracts - NSA igo ooO 

6. Government Reports Announcements - 

GRA ■ „ 

55 000 1970 

/. Electronic Components Databank . 4 271 1970 

Chemical Abstracts Condensates file- is being tested. 

The total yearly updating rating about 280 000 references. 
13. CONCLUSION 

During the five years of activities the documentation centre at the- - 
Royal Institute of Technology has established itself as an' information 
centre in the fields of science and technology. 
The SDI service is now well implemented and its activities are used 
and appreciated by scientists, research workers and engineers at the 
universities, research institutions and ia the industrial co^nunities. 
Techniques for on-line SDI-query formulation and query alternation 
adaptive to user feedback are under development. 
The on-lina connection to the NASArs Recon system in Darmstadt 
enables us to make retrospective searches in interactive mode 
Research is going on for linking up the Swedish network for Library 
Infomation system - LIBRIS - with international data banks with 
the objective to achieve a. comprehensive information retrieval system 
for the whole country. 
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